The guilt head sea bream (Sparus aurata) is not supported by the KEGG library meaning it is not possible to annotate differentially expressed genes to identify up or down-regulated pathways. We must find a KEGG-supported organism which shares enough homologous genes with Sparus aurata so we can map its genes onto our study species. Most commonly used KEGG-supported organisms is Danio rerio, hence we try to map Sparus aurata genes onto its genome and perform a first KEGG analysis as is.
The KEGG analysis on Danio rerio does not yield important results at this stage. This is likely the result of many differentially expressed genes in Guilthead sea bream not being mapped onto homologous genes in Zebrafish. Consequently, the KEGG enrichment analysis may yield both false positives and false negatives. I developed an algorithm to determine which KEGG-supported organism is the most suitable to use for performing a KEGG enrichment analysis on a non-model organism. First, I extracted all homologous genes known for Sparus aurata into other organisms. The search was performed using the up-to-date NCBI database.
Turns out, the KEGG-supported organism with the best homologous genes (on average) is Oryzias latipes so I re-ran a KEGG enrichment test, hoping it would yield better results than using Zebrafish KEGG annotations.
Overall, KEGG enrichment test performed using Danio rerio or Oryzias latipes annotations agree. There is a general increase in pathways relating to energy production, cytoskeleton and ribosomal activity (expected in developing organs).
Even though Oryzias latipes is the species allowing the most exhaustive mapping of genes of Sparus aurata, almost 50% of the genes we try to map still do not find correspondence in the KEGG-supported organisms genomes. In addition, several of the homologous genes in Oryzias latipes are deemed low-quality homologs of Sparus aurata, suggesting that some of them may not share the same functions between the two species.
In an effort to look for a KEGG-supported species which covers Sparus aurata genome well but also has reliable homologous genes, I checked how all the differentially expressed genes in our dataset map on the different KEGG-supported organisms. Out of 1503 differentially expressed genes of Sparus aurata, 1435 found reliable homologous genes in various KEGG-supported species.
When isolating the genes of interest (i.e., differentially expressed genes) and we look for the best homologous genes across all species carrying a known homolog, NA appears to carry most of them. It is also KEGG-supported, meaning a KEGG-analysis on this species rather than on Danio rerio is more likely to show meaningful pathways. (The heatmap and the sankey plot both show the same result.)
Regardless of which species we use for the KEGG annotations, a substantial amount of our genes are left out of the analysis. Hence, instead of looking for only one KEGG-supported species that shares many homologous genes with Sparus aurata, I annotate the differentially expressed genes using multiple KEGG-supported organisms.
Two strategies are possible at this stage: 1) we can perform a classic KEGG-enrichment analysis on Seriola dumerili and sacrifice some of our genes of interest in the process, or 2) we can try to annotate as much of the Sparus aurata genes onto multiple species and run a custom enrichment analysis.
For all the known homologous genes found for Sparus aurata, I check if there is a KEGG annotation available in the KEGG database. For each gene of Sparus aurata we want to annotate, several homologous genes have similar (if not identical) quality which means we must choose one over the others to include in the custom KEGG-analysis. If we were to keep them all, we would bias the test towards these genes. I make the decision that if two homologs are ex-aequo in terms of quality, I pick the homolog which species ranked higher in part 2.
This multi-species annotations do not return more precise results than what has been observed when mapping Sparus aurata genes onto Danio rerio or Oryzias latipes.
Although the multi-species analysis is a nice approach, it is experimental and needs to be 1) stabilized, and 2) benchmarked. Until the algorithm is refined, I am also using the classic KEGG analysis on Seriola dumerili. The vertical red dashed line shows the statistical significance of alpha = 0.05 whereas the vertical red dotted line shows the statistical significance of alpha = 0.1.
Selecting for the KEGG-supported organism with the most robust functional homologs to Sparus aurata genes (i.e., Seriola dumerili) shows similar results to other KEGG analyses (up-regulated ribosome, oxydative phosphorylation, ECM-receptor activity, cardiac muscle contraction)
It also highlights the downregulation of the Wnt signalling pathway at the level (alpha = 0.05), which was associated with fish masculinization in the literature. Genes differentially expressed on our dataset relating to this pathway are: rhoab, , wnt5b, vangl2, tle2a, gbp, csnk1a1, tcf7l1a, lrp6, RAC1, psen1, chd8, CTNNB1 plus some uncharacterized locis ENSSAUG00010005638, ENSSAUG00010012826, ENSSAUG00010023190.
There is weak evidence (alpha = 0.1), that pathways such as Oocyte meiosis, Progesterone-mediated oocyte maturation, and Cellular senescence are also down-regulated during the development of Sparus aurata gonads. Genes involved in thes pathways are:
Oocyte meiosis: YWHAG, mad1l1, ccne1, , itpr2, YWHAB, cdc20, ccne2, ywhae2, espl1, ccnb2, ccnb1, calm1a, cdk2, mapk12b, ITPR2 plus some uncharacterized locis ENSSAUG00010004114, ENSSAUG00010009404, ENSSAUG00010018214, ENSSAUG00010023190
Progesterone-mediated oocyte maturation: mad1l1, hsp90ab1, , CDC25B, ccna2, ccnb2, ccnb1, gnaia, cdk2, kif22, mapk12b, hsp90aa1.2 plus some uncharacterized locis ENSSAUG00010005793
Cellular senescence: ccne1, itpr2, , zfp36l2, ccne2, ccna2, ccnb2, si:ch211-160o17.4, ccnb1, calm1a, vdac1, cdk2, mapk12b, mybl2b, HIPK1, ITPR2 plus some uncharacterized locis ENSSAUG00010005793, ENSSAUG00010012826
To verify the relevance of the genes found in the KEGG analysis, I search databases of gene co-expression and interaction and cross these data with modules of co-expression found in our dataset. The goal is to find if 1) specific differentially expressed genes or 2) groups of genes which co-express in our dataset are also known to interact or co-express with genes known to affect sex-determination in other fish species. In other words, we want to know if the genes in our dataset (and the ones found in the KEGG analyses) have known association with sex-determination genes.
The first dataset I scanned was the GENEMANIA database. It is easier to work with than other SQL-based databases. However, its content was passed into the private domain in 2015 and it has not been updated since 2013. So although known associations are unlikely to be disproved, it lacks new associations that were documented these passed 10 years or so. It also only works on model organisms which forces us to work with Danio rerio genes.
Here you see the network of genes from Danio rerio which have known genetic and physical interaction patterns. Each node is a gene and the connections between them mean that they were found to interact or co-express in some context. If the node is blue, it means this gene was found to be down-regulated in our dataset for Sparus aurata. If the node is in red, the gene was up-regulated. If the node is in grey, it was neither up nor down regulated, or was not part of our dataset to begin with.
Here I plot the network of co-expression between the genes of Sparus aurata in our dataset. The network was built using the WGCNA method in R. The technique has its weaknesses, but it is widely used in the field of transcriptomics. Again, genes in blue are down-regulated during gonad development, genes in red are up-regulated during gonad development, genes in grey were not differentially expressed (I remove most of them for clarity).
I crossed known patterns of interaction (GENEMANIA) with the patterns of interaction we found in the network analysis (WGCNA). I also add to the plot ‘special genes’: 1) the genes involved in the KEGG pathways relating to sex-determination in Seriola dumerili (indicated as boxes) and 2) genes known to affect sex-determination in Danio rerio (indicated as stars). For each one of them, I added how they connect to the differentially expressed genes in the Sparus aurata dataset. This way you can see which Sparus aurata genes are somehow connected to sex-determination genes and how known sex-determining genes co-express with genes in your dataset.For instance, if a red star (or a group of red stars) connect to a group of blue genes in the network, then it means that the down-regulated genes of this cluster are somewhat related to sex-determination in Sparus aurata.
Since data were collected from organs during differentiation, it is normal to detect differentially expressed genes which drive symmetry, cell energy production, focal adhesion etc. But if a group of genes somewhat includes many genes from a sex-determining KEGG pathway and connects a lot to genes involved in sex-determination, then it suggests that these genes are involved in sex-determination in your dataset. For instance, sox32 is down-regulated in the dataset and it connects to wnt10a which was also reported in the literature as promoting masculinization in Zebrafish when down-regulated. And wnt10a was reported in the literature to co-express with apoa4a, an apolipoprotein transporter involved in the accumulation of DHA in female zebrafish ovary devolpment (https://doi.org/10.1016/j.cbpa.2019.04.005). sox32 also co-express with two clusters of down-regulated genes, one of which carries several genes involved in one of the down-regulated KEGG pathways involved in sex-determination (ccna, ccnb1, ccnb2, cdc20). Also, BDNF, although not expressed in your samples is known to interact with two genes differentially expressed in Sparus aurata gonads: larp6a (down-regulated) which is required for adequate oocyte development in Danio rerio (https://doi.org/10.1242/dev.187385). ogbhd co-expresses with dmrt1 in zebrafish, and bik is involved in poor breeding performances in guilthead seabream. Both co-express with a cluster of up-regulated ribosomal genes.
The networks are interactive, dig around to see if some of the genes there are familiar to you, or if any association seems relevant.